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Abstract. Algorithmic debugging is a semi-automatic debugging tech- 
nique that allows the programmer to precisely identify the location of 
bugs without the need to inspect the source code. The technique has 
been successfully adapted to all paradigms and mature implementations 
have been released for languages such as Haskell, Prolog or Java. During 
three decades, the algorithm introduced by Shapiro and later improved 

i | by Hirunkitti has been thought optimal. In this paper we first show that 

this algorithm is not optimal, and moreover, in some situations it is un- 

XS*\ able to find all possible solutions, thus it is incomplete. Then, we present 

a new version of the algorithm that is proven optimal, and we introduce 
^j some equations that allow the algorithm to identify all optimal solutions. 
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Debugging is one of the most important but less automated (and, thus, time- 
consuming) tasks in the software development process. The programmer is often 
forced to manually explore the code or iterate over it using, e.g., breakpoints, 
and this process usually requires a deep understanding of the source code to find 
the bug. Algorithmic debugging [16] is a semi-automatic debugging technique 
that has been extended to practically all paradigms [17] • Recent research has 
produced new advances to increase the scalability of the technique producing 
H new scalable and mature debuggers. The technique is based on the answers of the 

programmer to a series of questions generated automatically by the algorithmic 
debugger. The questions are always whether a given result of an activation of a 
subcomputation with given input values is actually correct. The answers provide 
the debugger with information about the correctness of some (sub) computations 
of a given program; and the debugger uses them to guide the search for the bug 
until a buggy portion of code is isolated. 



* This work has been partially supported by the Spanish Ministerio de Ciencia e 
Innovacion under grant TIN2008-06622-C03-02 and by the Generalitat Valenciana 
under grant PROMETEO/2011/052. 
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Example 1. Consider this simple Haskell program inspired in a similar example 
by |B]. It wrongly (it has a bug) implements the sorting algorithm Insertion Sort: 

main = insort [2,1,3] 

insort [] = [] 

insort (x:xs) = insert x (insort xs) 

insert x [] = [x] 

insert x (y:ys) = if x>=y then (x:y:ys) 

else (y: (insert x ys)) 

An algorithmic debugging session for this program is the following (YES and NO 
answers are provided by the programmer): 

Starting Debugging Session. . . 

(1) insort [1,3] = [3,1]? NO 

(2) insort [3] = [3]? YES 

(3) insert 1 [3] = [3,1]? NO 

(4) insert 1 [] = [1]? YES 

Bug found in rule: 

insert x (y:ys) = if x>=y then _ else (y: (insert x ys)) 

The debugger points out the part of the code that contains the bug. In this case 
x>=y should be x<=y. Note that, to debug the program, the programmer only 
has to answer questions. It is not even necessary to see the code. 

Typically, algorithmic debuggers have a front-end that produces a data struc- 
ture representing a program execution — the so-called execution tree (ET) [14] — ; 
and a back-end that uses the ET to ask questions and process the answers of the 
programmer to locate the bug. For instance, the ET of the program in Example[l] 
is depicted in Figure [I] 



main = [3,2,1] 



insort [2,1,3] = [3,2,1] 



insert 2 [3,1] = [3,2,1] j [ insort [1,3] = [3,1] 



[ insert 2 [1] = [2,1] ) [ insert 1 [3] = [3,1] ] [ insort [3] = [3] ] 

( insert 1 [] = [1] ] ( insert 3 [] = [3] ] [ insort [] = [] 



Fig. 1. ET of the program in Example [T] 



The strategy used to decide what nodes of the ET should be asked is crucial 
for the performance of the technique. Since the definition of algorithmic debug- 
ging, there has been a lot of research concerning the definition of new strate- 
gies trying to minimize the number of questions [17]. We conducted several 
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experiments to measure the performance of all current algorithmic debugging 
strategies. The results of the experiments are shown in Figure |2j where the first 
column contains the names of the benchmarks; column nodes shows the number 
of nodes in the ET associated with each benchmark; and the other columns rep- 
resent algorithmic debugging strategies [17] that are ordered according to their 
performance: Optimal Divide & Query (D&QO), Divide & Query by Hirunkitti 
(D&QH), Divide & Query by Shapiro (D&QS), Divide by Rules & Query (DR&Q), 
Heaviest First (HF), More Rules First (MRF), Hat Delta Proportion (HD-P), Top- 
Down (TD), Hat Delta YES (HD-Y), Hat Delta NO (HD-N), Single Stepping (SS). 



Benchmark 


Modes 


D&QO 


D&QH 


D&QS 


DR&Q 


HF 


MRF 


HD-P 


TD 


HD-Y 


HD-N 


SS 


Average 


NumReader 


12 


28,99 


28,99 


31,36 


29,59 


44,38 


44,38 


49,70 


49,70 


49,70 


49,70 


53,25 


41,80 


Orde rings 


46 


12,04 


12,09 


12,63 


14,40 


17,16 


17,29 


21,05 


20,82 


20,60 


19,60 


51,02 


19,88 


Factoricer 


62 


9,83 


9,83 


9,93 


20,03 


12,55 


12,55 


15,04 


12,55 


15,04 


18,29 


50,77 


16,94 


Sedge wick 


12 


30,77 


30,77 


33,14 


30,77 


34,91 


34,91 


43,79 


43,20 


43,79 


43,79 


53,25 


38,46 


Clasifier 


23 


19,79 


20,31 


22,40 


21,88 


22,92 


23,26 


32,12 


31,94 


32,12 


34,55 


51,91 


28,47 


LegendGame 


71 


8,87 


8,87 


8,95 


16,72 


11,15 


11,23 


14,68 


13,37 


14,68 


16,94 


50,68 


16,01 


Cues 


18 


31,58 


32,41 


32,41 


32,41 


33,24 


34,63 


39,06 


42,11 


39,06 


44,32 


52,35 


37,60 


Romanic 


123 


6,40 


10,84 


11,23 


13,56 


7,44 


11,88 


13,29 


13,41 


13,29 


13,30 


50,40 


15,00 


Fib Recursive 


4.619 


0,27 


0,27 


0,2S 


1,20 


0,33 


0,41 


3,92 


0,46 


3,92 


0,48 


50,01 


5,59 


Risk 


33 


16,78 


16,78 


18,08 


19,38 


18,69 


18,69 


24,31 


31,14 


24,31 


32,79 


51,38 


24,76 


FactTrans 


198 


3,89 


3,89 


3,93 


6,22 


6,58 


6,58 


7,37 


7,16 


7,24 


7,50 


50,25 


10,06 


Rnd Quicksort 


72 


8,73 


8,73 


8,73 


11,41 


12,03 


12,23 


13,62 


13,51 


12,93 


14,54 


50,67 


15,19 


BinarvArrays 


128 


5,52 


5,52 


5,71 


7,13 


7,75 


7,94 


7,90 


8,59 


8,15 


8,71 


50,38 


11,21 


FibFactAna 


351 


2,44 


2,44 


2,45 


5,38 


7,61 


7,71 


6,40 


8,57 


7,39 


5,99 


50,14 


9,68 


Newton Pol 


7 


39,06 


39,06 


43,75 


39,06 


43,75 


43,75 


45,31 


45,31 


45,31 


45,31 


54,69 


44,03 


RegreslonTest 


18 


23,27 


23,27 


25,21 


25,21 


26,87 


26,87 


32,96 


32,96 


32,96 


32,96 


52,35 


30,45 


BoubleFibArrays 


171 


4,40 


4,41 


4,57 


11,40 


5,95 


6,96 


24,50 


6,96 


24,87 


6,96 


50,29 


13,75 


Co mplexN umbers 


60 


10,02 


10,02 


10,32 


11,31 


11,39 


11,39 


15,78 


15,75 


15,80 


19,19 


50,79 


16,53 


Integral 


5 


44,44 


44,44 


47,22 


44,44 


50,00 


50,00 


50,00 


50,00 


50,00 


50,00 


55,56 


48,74 


TestMath 


48 


11,91 


11,91 


12,16 


12,99 


15,95 


16,28 


22,41 


24,20 


23,87 


22,37 


50,98 


20,46 


TestMathZ 


228 


3,51 


3,51 


3,51 


9,73 


10,55 


10,81 


12,29 


28,56 


13,24 


14,37 


50,22 


14,57 


Figures 


113 


6,72 


6,75 


6,79 


8,09 


7,68 


7,79 


10,17 


10,60 


10,16 


10,76 


50,43 


12,36 


FactCalc 


59 


10,11 


10,14 


10,42 


11,53 


13,69 


14,22 


20,47 


18,50 


20,47 


20,69 


50,81 


18,28 


SpaceLimits 


127 


12,95 


16,07 


19,15 


21,74 


13,68 


16,80 


22,87 


22,78 


22,86 


26,15 


50,38 


22,31 


Argparser 


129 


12,10 


12,10 


13,08 


20,48 


13,07 


13,32 


15,98 


15,98 


15,98 


15,98 


50,38 


18,04 


Cgllb 


1.216 


1,93 


1,93 


2,33 


2,12 


2,52 


2,65 


6,14 


6,61 


5,73 


7,32 


50,04 


8,12 


KxmlZ 


1.172 


2,86 


2,86 


3,01 


3,56 


3,06 


3,48 


8,58 


6,79 


6,97 


7,77 


50,04 


9,00 


Java ssi st 


1.357 


4,34 


4,34 


5,44 


4,49 


4,74 


4,75 


6,20 


5,86 


9,26 


6,06 


50,04 


9,59 


Average 


374,21 


13,34 


13,66 


14,58 


16,29 


16,41 


16,88 


20,92 


20,98 


21,06 


21,30 


51,19 


20,60 



Fig. 2. Performance of algorithmic debugging strategies 



For each benchmark, we produced its associated ET and assumed that the 
buggy node could be any node of the ET (i.e., any subcomputation in the ex- 
ecution of the program could be buggy). Therefore, we performed a different 
experiment for each possible case and, hence, each cell of the table summarizes 
a number of experiments that were automatized. In particular, benchmark Fac- 
toricer has been debugged 62 times with each strategy; each time we selected 
a different node and simulated that it was buggy, thus the results shown are 
the average number of questions performed by each strategy with respect to the 
number of nodes (i.e., the mean percentage of nodes asked). Similarly, bench- 
mark Cglib has been debugged 1216 times with each strategy, and so on. 

Observe that the best algorithmic debugging strategies in practice are the two 
variants of Divide and Query (ignoring our new technique D&QO). Moreover, 
from a theoretical point of view, this strategy has been thought optimal in the 
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worst case for almost 30 years, and it has been implemented in almost all current 
algorithmic debuggers (see, e.g., [4 5 8 15 ). In this paper we show that current 
algorithms for D&Q are suboptimal. We show the problems of D&Q and solve 
them in a new improved algorithm that is proven optimal. Moreover, the original 
strategy was only defined for ETs where all the nodes have an individual weight of 
1. In contrast, we allow our algorithms to work with different individual weights 
that can be integer, but also decimal. An individual weight of zero means that 
this node cannot contain the bug. A positive individual weight approximates 
the probability of being buggy. The higher the individual weight, the higher the 
probability. This generalization strongly influences the technique and allows us 
to assign different probabilities of being buggy to different parts of the program. 
For instance, a recursive function with higher-order calls should be assigned a 
higher individual weight than a function implementing a simple base case [T7] . 
The weight of the nodes can also be reassigned dynamically during the debugging 
session in order to take into account the oracle's answers [5]. 

We show that the original algorithms are inefficient with ETs where nodes 
can have different individual weights in the domain of the positive real numbers 
(including zero) and we redefine the technique for these generalized ETs. 

The rest of the paper has been organized as follows. In Section [2] we recall 
and formalize the strategy D&Q and we show with counterexamples that it is 
suboptimal and incomplete. Then, in Section [3] we introduce two new algorithms 
for D&Q that are optimal and complete. Each algorithm is useful for a different 
type of ET. Finally, Section [4] concludes. Proofs of technical results can be found 
in the appendix. 



2 D&Q by Shapiro vs. D&Q by Hirunkitti 

In this section we formalize the strategy D&Q to show the differences between 
the original version by Shapiro [16 and the improved version by Hirunkitti and 
Hogger [7J. We start with the definition of marked execution tree, that is an 
ET where some nodes could have been removed because they were marked as 
correct (i.e., answered YES), some nodes could have been marked as wrong (i.e., 
answered NO) and the correctness of the other nodes is undefined. 

Definition 1 (Marked Execution Tree). A marked execution tree (MET) 
is a tree T = (TV, E : M) where N are the nodes, E C N x N are the edges, and 
M : N — » V is a marking total function that assigns to all the nodes in N a 
value in the domain V = {Wrong, Undefined}. 

Initially, all nodes in the MET are marked as Undefined. But with every 
answer of the user, a new MET is produced. Concretely, given a MET T = 
(TV, E, M) and a node n E iV, the answer of the user to the question in n 
produces a new MET such that: (i) if the answer is YES, then this node and its 
subtree is removed from the MET. (ii) If the answer is NO, then, all the nodes 
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in the MET are removed except this node and its descendants}^] Therefore, note 
that the only node that can be marked as Wrong is the root. Moreover, the rest 
of nodes can only be marked as Undefined because when the answer is YES, the 
associated subtree is deleted from the MET. 

Therefore, the size of the MET is gradually reduced with the answers. If we 
delete all nodes in the MET then the debugger concludes that no bug has been 
found. If, contrarily, we finish with a MET composed of a single node marked as 
wrong, this node is called the buggy node and it is pointed to as being responsible 
for the bug of the program. 

All this process is defined in Algorithm [I] where function selectNode selects 
a node in the MET to be asked to the user with function askNode. Therefore, 
selectNode is the central point of this paper. In the rest of this section, we 
assume that selectNode implements D&Q. In the following we use E* to refer to 
the reflexive and transitive closure of E and E + for the transitive closure. 



Algorithm 1 General algorithm for algorithmic debugging 
Input: A MET T = (A, E, M) 

Output: A buggy node or _L if no buggy node is detected 
Preconditions: Vn £ N, M(n) — Undefined 
Initialization: buggyNode = _L 

begin 

(1) do 



(2) 


node = selectNode(T) 


(3) 


answer — askNode (node) 


(4) 


if (answer = Wrong) 


(5) 


then M(node) = Wrong 


(6) 


buggyNode = node 


(7) 


N = {n £ N | (node -► n) £ £*} 


(8) 


else N = N\{n £ N | (node ->> n) £ E*} 



(9) while (3n £ N, M(n) = Undefined) 

(10) return buggyNode 

end 



Both D&Q by Shapiro and D&Q by Hirunkitti assume that the individual 
weight of a node is always 1. Therefore, given a MET T = (TV, E, M), the weight 
of the subtree rooted at node n £ A 7 ", w n , is defined recursively as its number of 
descendants including itself (i.e., 1 + ^2 {w n > \ (n — >> n') £ E}). 

D&Q tries to simulate a dichotomic search by selecting the node that better 
divides the MET into two subMETs with a weight as similar as possible. There- 



1 It is also possible to accept I don't know as an answer of the user. In this case, the 
debugger simply selects another node [8]. For simplicity, we assume here that the 
user only answers Correct or Wrong. 
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fore, given a MET with n nodes, D&Q searches for the node whose weight is 
closer to ^. The original algorithm by Shapiro always selects: 

— the heaviest node n' whose weight is as close as possible to | with w n > < | 

Hirunkitti and Hogger noted that this is not enough to divide the MET by half 
and their improved version always selects the node whose weight is closer to | 
between: 

— the heaviest node n' whose weight is as close as possible to ^ with w n > < ^, 
or 

— the lightest node n' whose weight is as close as possible to ^ with w n > > § 

Because it is better, in the rest of the article we only consider Hirunkitti's 
D&Q and refer to it as D&Q. 



2.1 Limitations of D&Q 

In this section we show that D&Q is suboptimal when the MET does not contain 
a wrong node (i.e., all nodes are marked as undefined) F\ The intuition beyond 
this limitation is that the objective of D&Q is to divide the tree by two, but the 
real objective should be to reduce the number of questions to be asked to the 
programmer. For instance, consider the MET in Figure [3] (left) where the black 
node is marked as wrong and D&Q would select the gray node. The objective 
of D&Q is to divide the 8 nodes into two groups of 4. Nevertheless, the real 
motivation of dividing the tree should be to divide the tree into two parts that 
would produce the same number of remaining questions (in this case 3). 

The problem comes from the fact that D&Q does not take into account 
the marking of wrong nodes. For instance, observe the two METs in Figure [3] 
(center) where each node is labeled with its weight and the black node is marked 
as wrong. In both cases D&Q would behave exactly in the same way, because it 
completely ignores the marking of the root. Nevertheless, it is evident that we do 
not need to ask again for a node that is already marked as wrong to determine 
whether it is buggy. However, D&Q counts the nodes marked as wrong as part 
of their own weight, and this is a source of inefficiency. 

In the METs of Figure [3] (center) we have two METs. In the one at the right 
nodes with weight 1 and 2 are optimal, but in the one at the left, only the node 
with weight 2 is optimal. In both METs D&Q would select either the node with 
weight 1 or the node with weight 2 (both are equally close to |). However, we 
show in Figure [3] (right) that selecting node 1 is suboptimal, and the strategy 
should always select node 2. Considering that the gray node is the first node 
selected by the strategy, then the number at the side of a node represents the 
number of questions needed to find the bug if the buggy node is this node. The 
number at the top of the figure represents the number of questions needed to 



2 Modern debuggers [8 allow the programmer to debug the MET while it is being 
generated. Thus the root node of the subtree being debugged is not necessarily 
marked as Wrong. 
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Fig. 3. Behavior of Divide and Query 



determine that there is not a bug. Clearly, as an average, it is better to select 
first the node with weight 2 because we would perform less questions (f vs. | 
considering all four possible cases). 

Therefore, D&Q returns a set of nodes that contains the best node, but it 
is not able to determine which of them is the best node, thus being suboptimal 
when it is not selected. In addition, the METs in Figure [4] show that D&Q is 
incomplete. Observe that the METs have 5 nodes, thus D&Q would always select 
the node with weight 2. However, the node with weight 4 is equally optimal (both 
need ^ questions as an average to find the bug) but it will be never selected by 
D&Q because its weight is far from the half of the tree §. 





Fig. 4. Incompleteness of Divide and Query 



Another limitation of D&Q is that it was designed to work with METs where 
all the nodes have the same individual weight, and moreover, this weight is as- 
sumed to be one. If we work with METs where nodes can have different individual 
weights and these weights can be any value greater or equal to zero, then D&Q 
is suboptimal as it is demonstrated by the MET in Figure |5j In this MET, D&Q 
would select node n\ because its weight is closer to ^ than any other node. 
However, node ri2 is the node that better divides the tree in two parts with the 
same probability of containing the bug. 

In summary, (1) D&Q is suboptimal when the MET is free of wrong nodes, 
(2) D&Q is correct when the MET contains wrong nodes and all the nodes of the 
MET have the same weight, but (3) D&Q is suboptimal when the MET contains 
wrong nodes and the nodes of the MET have different individual weights. 
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Fig. 5. MET with decimal individual weights 



3 Optimal D&Q 

In this section we introduce a new version of D&Q that tries to divide the MET 
into two parts with the same probability of containing the bug (instead of two 
parts with the same weight). We introduce new algorithms that are correct and 
complete even if the MET contains nodes with different individual weights. For 
this, we define the search area of a MET as the set of undefined nodes. 

Definition 2 (Search area). Let T = (N,E,M) be a MET. The search area 

of T , Sea(T), is defined as {n G N | M(n) = Undefined}. 

While D&Q uses the whole T, we only use Sea(T), because answering all 
nodes in Sea(T) guarantees that we can discover all buggy nodes [9 . Moreover, 
in the following we refer to the individual weight of a node n with wi n ; and we 
refer to the weight of a (sub) tree rooted at n with w n that is recursively defined 
as: 



W„ 



_ / E i w n> | (n -> n') e E} if M(n) ^ Undefined 

wi n + E { w n f | ( n —> n ') £ E} otherwise 



Note that, contrarily to standard D&Q, the definition of w n excludes those 
nodes that are not in the search area (i.e., the root node when it is wrong). Note 
also that wi n allows us to assign any individual weight to the nodes. This is an 
important generalization of D&Q where it is assumed that all nodes have the 
same individual weight and it is always 1. 

3.1 Debugging ETs where all nodes have the same individual 
weight wi E 7^ + 

For the sake of clarity, given a node n G Sea(T), we distinguish between three 
subareas of Sea(T) induced by n: (1) n itself, whose individual weight is wi n ; 
(2) descendants of n, whose weight is 

Down(n) = J2 { wi w \ n' G Sea(T) A (n -> n') G E + } 
and (3) the rest of nodes, whose weight is 

Up(n) = E {win' | ri G Sea(T) A (n -> n') E*} 
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Fig. 6. Functions Up and Down 

Example 2. Consider the MET in Figure [6] Assuming that the root n is marked 
as wrong and all nodes have an individual weight of 1, then Sea(T) contains all 
nodes except n, Up(n') = 4 (total weight of the gray nodes), and Down(n f ) = 3 
(total weight of the white nodes) . 

Clearly, for any MET whose root is n and a node tt/, M(n f ) = Undefined, we 
have that: 

w n = Up(n') + Down(n') + wi n > (Equation 1) 

w n t = Down(n') + wi n > (Equation 2) 

Intuitively, given a node n, what we want to divide by half is the area formed 
by Up(n) + Down(n). That is, n will not be part of Sea(T) after it has been 
answered, thus the objective is to make Up(n) equal to Down(n). This is another 
important difference with traditional D&Q: wi n should not be considered when 
dividing the MET. We use the notation ri\ ^> 77-2 to express that ri\ divides 
Sea(T) better than 772 (i.e., \Down(ni) — Up(n\)\ < \Down(rt2) — Up{ri2)\). And 
we use n\ = n2 to express that n\ and n2 equally divide Sea(T). If we find a 
node n such that Up(n) = Down(n) then n produces an optimal division, and 
should be selected by the strategy. If an optimal solution cannot be found, the 
following theorem states how to compare the nodes in order to decide which of 
them should be selected. 

Theorem 1. Given a METT = (iV, E, M) whose root isn G N , where W, n" G 
N,wi n > = wi n " and W G N,wi n > > 0, and given two nodes 774,77,2 G Sea(T), 
with w ni > w n2 , ni ^> 77,2 if and only if w n > w ni + w n2 — wi n . 

Theorem 2. Given a METT = (TV, E, M) whose root isn G N, where Vn 7 , n" G 
N,wi n > = wi n >> and W G N,wi n > > ; and given two nodes 77-1,77-2 G Sea(T), 
with w ni > w n2 , ni = ^2 if and only if w n = w ni + w n2 — wi n . 

Theorem [l] is useful when one node is heavier than the other. In the case that 
both nodes have the same weight, then the following theorem guarantees that 
they both equally divide the MET in all situations. 

Theorem 3. Let T = (TV, E, M) be a MET where Vn, n' G N, wi n = wi n > and 
Vn G N,wi n > ; and let 774,77-2 G Sea(T) be two nodes, if w ni = w n2 then 

77i =772- 
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Corollary 1. Given a MET T = (N,E,M) where \/n,n' G N,wi n = wi n > and 
Vn G A, wi n > 0, and given a node n G Sea(T), then n optimally divides Sea(T) 
if and only if Up(n) = Down(n). 

While Corollary [l] states the objective of optimal D&Q (finding a node n such 
that Up(n) = Down(n)), Theorems [I] and [3] provide a method to approximate 
this objective (finding a node n such that \Down(n) — Up(n)\ is minimum in 

Sea(T)). 

An algorithm for Optimal D&Q. Theorems [I] and [2] provide equation w n > 
w ni +w n2 —wi n to compare two nodes ni, n^ by efficiently determining n\ ^> n2, 
m = n2 or n\ <C n2- However, with only this equation, we should compare all 
nodes to select the best of them (i.e., n such that J??/,??/ ^> n). Hence, in this 
section we provide an algorithm that allows us to find the best node in a MET 
with a minimum set of node comparisons. 

Given a MET, Algorithm [2] efficiently determines the best node to divide 
Sea(T) by half (in the following the optimal node). In order to find this node, 
the algorithm does not need to compare all nodes in the MET. It follows a path 
of nodes from the root to the optimal node which is closer to the root producing 
a minimum set of comparisons. 



Algorithm 2 Optimal D&Q — SelectNode in Algorithm [T]— 
Input: A MET T = (iV, E, M) whose root is n e N, 

Wz/, n" £ N, wi n ' = wi n " and Vn' G N, wi n / > 
Output: A node n optimal £ Af 
Preconditions: 3n 6 A, M(n') — Undefined 

begin 

(1) Candidate = n 

(2) do 

(3) Best = Candidate 

(4) Children = {m \ (Best ->► m) G E} 

(5) if (Children = 0) then return Best 

(6) Candidate = n \ \fn" with n',n" 6 Children, w n r > w n " 

(7) While (Wcandidate > ^) 

(8) if (M(Best) = Wrong) then return Candidate 

(9) if (w n > WBest + Wcandidate - wi n ) then return Best 

(10) else return Candidate 

end 



Example 3. Consider the MET in Figure [7] where Vn G A, wi n = 1 and M(n) = 
Undefined. Observe that Algorithm [2] only needs to apply the equation in The- 
orem [T] once to identify an optimal node. Firstly, it traverses the MET top- 
down from the root selecting at each level the heaviest node until we find a 
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Fig. 7. Defining a path in a MET to find the optimal node 



node whose weight is smaller than the half of the MET (^f), thus, defining a 
path in the MET that is colored in gray. Then, the algorithm uses the equation 
^n > %! +w n2 —wi n to compare nodes ri\ and 77-2. Finally, the algorithm selects 

In order to prove the correctness of Algorithm [2j we need to prove that (1) 
the node returned is really an optimal node, and (2) this node will always be 
found by the algorithm (i.e., it is always in the path defined by the algorithm). 

The first point can be proven with Theorems [I] [2] and [3] The second point 
is the key idea of the algorithm and it relies on an interesting property of the 
path defined: while defining the path in the MET, only four cases are possible, 
and all of them coincide in that the subtree of the heaviest node will contain an 
optimal node. 

In particular, when we use Algorithm [2] and compare two nodes 774,77,2 in a 
MET whose root is n, we find four possible cases: 

Case 1: 774 and 77-2 are brothers. 
Case 2: w ni > w n2 A w n2 > ^. 
Case 3: w ni > ^f A w n2 < ^f . 
Case 4: w ni > w n2 A w ni < ^. 

We have proven — the individual proofs are part of the proof of Theorem [4] — 
that in cases 1 and 4, the heaviest node is better (i.e., if w ni > w n2 then m ^> 
77-2); In case 2, the lightest node is better; and in case 3, the best node must 
be determined with the equations of Theorems [T] [2] and [3J Observe that these 
results allow the algorithm to determine the path to the optimal node that is 
closer to the root. For instance, in Example [3] case 1 is used to select a child, 
e.g., node 12 instead of node 5 or node 2, and node 8 instead of node 3. Case 2 
is used to go down and select node 12 instead of node 20. Case 4 is used to stop 
going down at node 8 because it is better than all its descendants. And it is also 
used to determine that nodes 2, 3 and 5 are better than all their descendants. 
Finally, case 3 is used to select the optimal node, 12 instead of 8. Note that 
D&Q could have selected node 8 that is equally close to ^ than node 12; but 
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Case 1 



Case 2 



Case 3 



Case 4 



Fig. 8. Determining the best node in a MET (four possible cases) 



it is suboptimal because Up (8) = 12 and Down(8) = 7 whereas Up (12) = 8 and 
Down(12) = 11. 

The correctness of Algorithm [2] is stated by the following theorem. 

Theorem 4 (Correctness). Let T = (N,E,M) be a MET where Vn,ra' € 
TV, m n = wi n > and Vn G N, wi n > 0, then the execution of Algorithm [#| with 
T as input always terminates producing as output a node n G Sea(T) such that 
V G Sea(T) \ n' > n. 

Algorithm [2] always returns a single optimal node. However, the equation 
in Theorem [T] in combination with the equation in Theorem [2] can be used to 
identify all optimal nodes in the MET. This is implemented in Algorithm [3] that 
is complete, and thus it returns nodes 2 and 4 in the MET of Figure [4] where 
D&Q can only detect node 2 as optimal. 



3.2 Debugging METs where nodes can have different individual 
weights in 7?+ U {0} 

In this section we generalize Divide and Query to the case where nodes can have 
different individual weights and these weights can be any value greater or equal to 
zero. As shown in Figure [5j in this general case traditional D&Q fails to identify 
the optimal node (it selects node ri\ but the optimal node is n^- The algorithm 
presented in the previous section is also suboptimal when the individual weights 
can be different. For instance, in the MET of Figure [5j it would select node 713. 
For this reason, in this section we introduce Algorithm [4j a general algorithm 
able to identify an optimal node in all cases. It does not mean that Algorithm [2] 
is useless. Algorithm [2] is optimal when all nodes have the same weight, and in 
that case, it is more efficient than Algorithm [4j Theorem [5] ensures the finiteness 
and correctness of Algorithm |4j 

Theorem 5 (Correctness). Let T = (N, E, M) be a MET where Vn G N, wi n > 
0, then the execution of Algorithm^ with T as input always terminates producing 
as output a node n G Sea(T) such that $n f G Sea(T) \ n' ^> n. 
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Algorithm 3 Optimal D&Q (Complete) — SelectNode in Algorithm IT] — 

Input: A MET T = (N, E, M) whose root is n G N, 

Vn', n" 6 A/", wi n / = wi n " and Vn' 6 A", iui n / > 
Output: A set of nodes OCA" 
Preconditions: 3n € A", M(n') — Undefined 

begin 

(1) Candidate = n 

(2) do 

(3) Best = Candidate 

(4) Children = {m | (Best -> m) e E} 

(5) if (Children = 0) then return {Best} 

(6) Candidate = n \ Vn" with n \n" 6 Children, u> n / > w n /> 

(7) While (Wcandidate > ^) 

(8) Candidates = {n ; | Vn" with n',n" 6 Children, iu n / > iy n //} 

(9) if (M(Best) = Wrong) then return Candidates 

(10) if (w n > WBest + wcandidate - wi n ) then return {Best} 

(11) if (w n = WBest + wcandidate - wi n ) then return {Best} U Candidates 

(12) else return Candidates 

end 



3.3 Debugging METs where nodes can have different individual 
weights in 71+ 



In the previous section we provided an algorithm that optimally selects an op- 
timal node of the MET with a minimum set of node comparisons. But this 
algorithm is not complete due to the fact that we allow the nodes to have an 
individual weight of zero. For instance, when all nodes have an individual weight 
of zero, Algorithm [4] returns a single optimal node, but it is not able to find all 
optimal nodes. 

Given a node (say n), the difference between having an individual weight of 
zero, wi n , and having a (total) weight of zero, w n , should be clear. The former 
means that this node did not cause the bug, the later means that none of the 
descendants of this node (neither the node itself) caused the bug. Surprisingly, 
the use of nodes with individual weights of zero has not been exploited in the 
literature. Assigning a (total) weight of zero to a node has been used for instance 
in the technique called Trusting [10]. This technique allows the user to trust a 
method. When this happens all the nodes related to this method and their 
descendants are pruned from the tree (i.e., these nodes have a (total) weight of 
zero). 

If we add the restriction that nodes cannot be assigned with an individual 
weight of zero, then we can refine Algorithm [4] to ensure completeness. This 
refined version is Algorithm [5] 
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Algorithm 4 Optimal D&Q General — SelectNode in Algorithm [l] — 

Input: A MET T = (A, E, M) whose root is n £ A and in e A, wi n > > 
Output: A node n optimal £ A 
Preconditions: 3n' £ A, M(n r ) — Undefined 

begin 

(1) Candidate = n 

(2) do 

(3) Best = Candidate 

(4) Children = {m | (Best ->► m) £ E} 

(5) if (Children = 0) then return Best 

(6) Candidate = n' \ in" with n',n" £ Children, w n / > w n n 

(7) While (Wcan^date ~ wi ™«°* > ^f) 

(8) Candidate = n \ in" with n \n" £ Children, w n / — ^^ > w n » — w% %" 

(9) if (M(Best) = Wrong) then return Candidate 

(10) if (w n > WBest + w C an^date - ^f^ - wtc ^ d ^ ) then return Best 

(11) else return Candidate 

end 



Algorithm 5 Optimal D&Q General (Complete) — SelectNode in Algorithm!!] — 

Input: A MET T = (A, E, M) whose root is n £ A and Vra' £ A, wi n > > 
Output: A set of nodes O C N 
Preconditions: 3n £ A, M{n') — Undefined 

begin 

(1) Candidate = n 

(2) do 

(3) Best = Candidate 

(4) Children = {m | (Best ^ ra) £ £} 

(5) if (Children = 0) then return {Best} 

(6) Candidate = n \ in" with n\n" £ Children, w n / > w n " 

(7) While (Wcan^date ~ wi <*?"** > ^f ) 

(8) Candidates = {ri \ in" with n' \n" £ Children, w n / - %^ > w n n - %^} 

(9) Candidate = n' £ Candidates 

(10) if (M(Best) = Wrong) then return Candidates 

(11) if \w n > WBest + w C an^date - ^f^ 1 - ^*f^ ) then return {Best} 

(12) if ( Wn = WBest + WCan^date " ^ " ^™^ ) then 

return {Best} U Candidates 

(13) else return Candidates 

end 



4 Conclusion 

During three decades, Divide & Query has been the more efficient algorith- 
mic debugging strategy. On the practical side, all current algorithmic debug- 
gers implement D&Q |l|3|5|8|ll|12|13|14|15j . and experiments |2|18| (see also 
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http://users.dsic.upv.eS/~jsilva/DDJ/#Experiments) demonstrate that it per- 



forms on average 2-36% less questions than other strategies. On the theoretical 
side, because D&Q intends a dichotomic search, it has been thought optimal 
with respect to the number of questions performed, and thus research on algo- 
rithmic debugging strategies has focused on other aspects such as reducing the 
complexity of questions. 

In this work we show that in some situations current algorithms for D&Q are 
incomplete and inefficient because they are not able to find all optimal nodes, 
and sometimes they return nodes that are not optimal. We have identified the 
sources of inefficiency and provided examples that show both the incompleteness 
and incorrectness of the technique. 

The main contribution of this work is a new algorithm for D&Q that is 
optimal in all cases; including a generalization of the technique where all nodes 
of the ET can have different individual weights in 1Z + U {0}. The algorithm 
has been proved terminating and correct. And a slightly modified version of 
the algorithm has been provided that returns all optimal solutions, thus being 
complete. 

We have implemented the technique and experiments show that it is more 
efficient than all previous algorithms (see column D&QO in Figure [2]). The imple- 
mentation — including the source code — and the experiments are publicly avail- 



able at: http://users.dsic.upv.es/^jsilva/DDJ. 
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XVII 

A Proofs of Technical Results 

In this section, for the sake of clarity, we use u n and d n instead of Up(n) and 
Down(n) respectively. Moreover, we distinguish between two kinds of METs to 



prove the theorems of sections \37[\ and 3.2 respectively. 



Definition 3 (Uniform MET). A uniform MET T = (N,E,M) is a MET, 
where Vn, n' G N, wi n = wi n > and Vn G iV, wi n > 0. 

Definition 4 (Variable MET). A variable MET T = (N,E,M) is a MET, 
where Vn G N, wi n > 0. 

A.l Proofs of Theorems [TJ [2] and [3] 

Here, we prove Theorems [I] [2] and [3] that are used in Algorithm [2] to compare 
nodes of the MET and determine which of them is better. For the proof of 
Theorem [I] we need to prove first the following lemma. 

Lemma 1. Let T = (TV, E, M) be a uniform MET whose root is n G N , and let 
ni,ri2 G Sea(T) be two nodes. Then, n\ ^> ri2 if and only ifu ni *d ni > u n2 *d n2 . 

Proof We prove that u ni *d ni > u n2 *d n2 implies that \d ni —u ni \ < \d n2 — u n2 \ 
and vice versa. This can be shown by developing the equation u ni * d ni > 
u n2 * a n2 . 

Firstly, note that w n = ^{wi n > \ n' G Sea(T)}, then by Equation 1 we know 
that w n = u ni -\-d ni +wi ni = u n2 +d n2 +wi n2 . Therefore, as wi ni = wi n2 = wi n 
the optimal division of Sea(T) happens when u ni = d ni = Wn ~™ ln . For the sake 
of simplicity in the notation, let c = w ^~ w ^ and let hi = c — d ni = u ni — c and 
h^—c — d n2 = u n2 — c. Then, 

Uni * Clni ^ U U 2 * a n2 

Therefore, we replace u ni , d ni , u n2 and d n2 \ 
(c + hi) * (c - hi) > (c + h 2 ) * (c - h 2 ) 

hi * c — U 2 
We simplify: 

— }l 2 

h\<h\ 

And finally we obtain that: 

\hi\ <\h 2 \ 
Hence, if the product u ni * d ni is greater than u n2 * d n2 then \h\\ < |/i2| an d 
thus, because hi and h^ represent distances to the center, ri\ ^> n^. 

Theorem fll Given a uniform MET T = (TV, E, M) whose root is n G N , 
and given two nodes ni,n<i G Sea(T), with w ni > w n2 , ri\ ^> n2 if and only if 
w n > w ni +w n2 - wi n . 

Proof By Lemma [T] we know that if u ni * d ni > u n2 * d n2 then ri\ ^> n^. Thus 
it is enough to prove that w n > w ni + w n2 — wi n implies u ni * d ni > u n2 * d n2 



c 2 — hi * c + hi * c — hi > c 2 — h 2 * c + h 2 * c — h 2 
We simplify: 

c 2 - hi > c 2 - h\ 

-hi > -hi 
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and vice versa when w ni > w n2 . 



W n > W ni + W n2 - Wl n 

Adding wi n — wi n '- 

W n > W ni + W n2 - 2 * Wi n + U)i n 

We replace w ni , w n2 by Equation 2: 

w n > dm + dn 2 + win 

Adding wi n * d — wi n * d: 

w n > dm + dn 2 + win * d + win - win * o? 

Wn > d ni + dn 2 + Win * ^ + ^n(l ~ d) 

Using d — 2 — ^ — we get: 

W n > d ni +d n2 +Win dn n _} dn +Wi n {l ~ ^ 
W n > d ni +dn 2 +Win d ^ h^ntf 1 " 



d n -, —d n< 2 n ^d n -. —d n2 
W n > d ni + dn 2 + Win -j ^ 1" wi n ~ d I ^— 



w n > d ni + dn 2 + win d — ZTd, 



dm ■ d n 



■ Wl r> 



d n 



Because d n , + d n2 = A 1 _^ 2 then: 

2 2 an i d ™2 

d rt 1 ~ d ^2 | d ni *win _ d n2 *wi n 
Wn > d ni -d n2 "+" d ni -d n2 d ni -d n2 

Because w ni > w n2 we know by Equation 2 that d ni — d n2 > 0, thus: 

{d ni - dn 2 )*w n > d ni - d n2 + d ni * wi n - d n2 * wi n 

d ni *w n - d n2 *w n > d n± - d n2 + d ni * win - d n2 * win 

d ni *w n - d ni - d ni * wi n > d n2 * w n - d n2 - d n2 * wi n 

d ni * (w n — d ni — wi n ) > d n2 * (w n — d n2 — wi n ) 

As win — wi ni = win 2 we replace w n — d ni — wi ni w n — d n2 — wi n by Equation 1: 

&n\ * Um -■> CLn 2 * ^n 2 

Theorem [2l Given a uniform MET T = (TV, £", M) whose root is n G iV ; 
and given £wo notie.s ni,ri2 G Sea(T), with w ni > w n2 , ri\ = n2 z/ and on/y if 

^n = %! +^n 2 - ^i n . 

Proof. The proof is completely analogous to the proof of Theorem [I] The only 
difference is that the equation that is developed should be w n = w ni +w n2 —wi n . 

Theorem [3} Let T = (N,E,M) be a uniform MET, and let n u n 2 G Sea(T) 
be two nodes, if w ni = w n2 then n\ = n 2 . 

Proof. We prove that w ni = w n2 implies \d ni — u ni \ = \d n<2 — u n2 \ and thus 
n\ = n 2 : 

w ni — Wn 2 we replace w ni , w n2 by Equation 2 

d ni + wi ni = d n2 + wi n2 using wi ni = wi n2 

d ni = dn 2 using w ni = w n2 

w ni -w n + dm = w n2 - w n + d n2 replacing w ni , w n2 by Equation 2 

(d ni + wini) _ (%i + d ni + wz ni ) + d ni and i^ n by Equation 1 

= (dn 2 + mn 2 ) - (un 2 + o?n 2 + wi n2 ) + d n2 we simplify 

U>ni Um = Cln 2 Un 2 
\U>ni Um I — \^ / n 2 Un 2 \ 
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Corollary [TJ Given a uniform MET T = (N,E,M), and given a node n G 
Sea(T), then n optimally divides Sea(T) if and only if u n = d n . 

Proof If n optimally divides Sea[T) then the product u n *d n is maximum, and 
there does not exist other node n' G Sea(T) such that u n > * d n > > u n * d n . 
This can be easily shown taking into account that the figure of the product is 
a parabola whose vertex is the maximum value. Therefore, we can compute the 
maximum by deriving the product. 

For simplicity, let prod = u n * d n and sum = u n + d n . Then, we start by 
transforming the equation u n * d n in such a way that it only depends on one of 
the factors (e.g., u n ): 

u n * d n = prod 

We replace d n : 

u n * (sum — u n ) — prod 

u n * sum — u n — prod 

We derive the equation and equate it to zero: 

a^(u n *sum-ui) = 

sum — 2u n = 

And finally we get the value of u n in the vertex: 

_ sum 
a n — 2 

Now, we can infer d n from u n by simply replacing the value of u n in the equation 
u n -\- d n = sum: 

sum i d n = sum 



2 

d n = sum ^^ 

j sum 

a n 



d n — u n 



2 

i 

2 " 
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A.2 Proof of Theorem H 

Theorem [4] states the correctness of Algorithm [2] used when all nodes have the 
same individual weight. Firstly, we proof the following auxiliary lemma. 

Lemma 2. Let T = (N,E,M) be a uniform MET whose root is n G N and 
ni,ri2 G Sea(T) with w ni > w n2 , if w n > w ni + w n2 then n\ ^> n 2 . 

Proof. Firstly, by Theorem fl] we know that if w n > w ni + w n2 — wi n when 
w ni > w n2 then n\ ^> 772. Therefore, as wi n > 0, if w n > w ni + w n2 then 
^n > %! + ^n 2 — wz n and hence ni ^> 772. 

In order to prove the correctness of Algorithm [2j we also need to prove the 



four cases presented in Section [37L] that are used in the algorithm: 

Case 1: n\ and n^ are brothers. 

Case 2: w ni > w n2 A w n2 > ^f. 

Case 3: w ni > ^f A w n2 < ^. 

Case 4: iu ni > w n2 A w ni < ^. 

We prove each case in a separate lemma. In case 1, the following lemma 
shows that given two brother nodes n\ and 772, then the heaviest node is better. 

Lemma 3. Given a uniform MET T = (TV, E, M) whose root is n G N and 
given three nodes n\ G N and n 2l nz G Sea(T) with (n — )> rii) G E* ,(n\ — » 
^2), (^1 -^ ^3) e E, n 2 ^> n 3 \/ n 2 = n 3 if and on/?/ if w U2 > w n3 . 

Proof We prove first that iu n2 > w n3 implies n 2 ^> 773 V n 2 = 773: Trivially, 
w n > ^n 2 + ^n 3 because n 2 and 77,3 are children of 77 1 and 77 1 is descendant of n. 
Therefore, by Lemma [2] and Theorem [3J n 2 ^> n^V n 2 = 773. Now, we prove that 
n 2 ^> n^\/ n 2 = 773 implies w n<2 > w n3 : We prove it by contradiction assuming 
that w n2 < w n3 when n 2 ^> 773 V 772 =77,3, and proving that when w n2 < w n3 and 
n 2 > n 3 V n 2 = n 3 , neither w n > w n2 + w n3 - wi n nor w n < w n2 + w n3 - wi n 
holds. By Theorem [I] w n > w n2 + iu n3 — K;i n is false because 772 ^> n^V n 2 = 773. 
Moreover, because n 2 and 773 are brothers, we know that w n > w n2 + k; U3 , and 
hence w n < w n2 + iu U3 — iui n is also false. 

In case 2, the following lemma ensures that given two nodes 77 1 and n 2 such 
that n\ — > n2, if ^n 2 > ^ then 772 is better. 



Lemma 4. Given a uniform MET T = (TV, i£, M) whose root is n G A 7 " ; 
given £wo nodes 771,772 G Sea(T), with (771 — )> 772) G E 1 , if w n2 > ^f- then 

77 2 > 77i. 

Proof. We prove the lemma by contradiction assuming that 774 ^> 772 or ri\ = n 2 . 
First, we know that w n2 = ^ +inc n2 with mc n2 > 0. And we know that w ni = 
^f- +inc n2 +wi n -\-inc ni with inc ni > 0, where mc ni represent the weight of the 
possible brothers of 772. By Theorems [I] and [2] we know that w n > w ni +w n2 —wi n 
when w ni > w n2 implies n\ ^> 772 V ri\ = n 2 . 
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w n > w ni + Wn 2 - wi n We replace w ni , w n2 

w n > (^f- + inc n2 + win + inc ni ) + (^ + inc n2 ) - wi n we simplify 

w n > ^ + mc n2 + mcnj + ^f- + mc„ 2 
w n >^ + ^+2* mcn 2 + inc ni 

Wn > W n + 2 * mC n2 + iflC ni 

> 2 * incn 2 + inc ni 

But, this is a contradiction with inc n2 > 0. Hence, ri2 ^> n\. 

In case 4, the following lemma ensures that given two nodes whose weight is 
smaller than ^f- then the heaviest node is better. 

Lemma 5. Given a uniform MET T = (N,E,M) whose root is n G N, and 
two nodes 774,77,2 G Sea(T), where ^f- > w ni > w n2 then ri\ ^> n^. 

Proof. We can assume that w ni = ^f- — dec ni and w n2 = ^f — dec n2 where 
dec n2 > dec ni > 0. Moreover, we know that w ni +w n2 = ^f- —dec ni + ^—dec n2 
and thus w ni + w n2 = w n — dec ni —dec n2 . Therefore, because dec n2 > dec ni > 0, 
we deduce that w n > w ni +w n2 . And as w ni > w n2 then, by Lemma[2j m ^> ri2- 

If two nodes n\ and 772 are brothers and ri\ is better than 772 then m is 
better than any descendant of 77,2. The following lemma proves this property 
that is complementary to Lemma [3] for case 1. 

Lemma 6. Given a uniform METT = (TV, E, M) whose root is n G N and four 
nodes n\ G N and 772, n 3 , 774 G Sea(T) with (n — >> 774) G i^* ; (ni — > 722), (ni — > 
n 3 ) G £", (n 3 -^ n 4 ) G £? + ; i/n 2 »n3Vn 2 =n3 ^/ien n 2 > n 4 . 

Proof. First, 77,2 and n 3 are brothers and 77,2 ^> n 3 V ri2 = n 3 then, by Lemma [3j 

we know that w n2 > w n3 . We distinguish two cases w n<2 > ^f- and ^f- > w n2 . 

If ^f- > w n2 then ^ > w n3 and by Lemma [5|n 3 ^> 77,4. 

If w n2 > ^ then we only have to demonstrate that ^f- > w n3 and then (as 

before) by Lemma [5] n 3 ^> 77,4. 

This can be easily proved having into account that w n > w n2 + k; U3 because 77,2 

and n 3 are children of 774 and rt\ is descendant of n, and that w n2 = ^f- + m^j 

with mc n2 > 0. 

Wn > % 2 + Wn 3 we replace u> n2 

%>(y+ incn 2 ) + ^3 

W n - ^ > inC n2 + % 3 

^ > incn 2 + K7n 3 as mcn 2 > 

"^ >Wn 3 

Therefore as 77,2 ^> n 3 V 77-2 = n 3 and n 3 ^> 77,4 then 77-2 ^774. 

The previous lemmas allow Algorithm [2] to find a path between the root 
node and an optimal node. The correctness of this algorithm is proved by the 
following theorem. 

Theorem J4J Let T = (TV, E, M) be a uniform MET, then the execution of 
Algorithm^ with T as input always terminates producing as output a node n G 
Sea{T) such that $n f G Sea(T) \ n' > 77. 
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Proof. The finiteness of the algorithm is proved thanks to the following invariant: 
wcandidate strictly decreases in each iteration. Therefore, because N is finite, 
wcandidate will eventually become smaller or equal to ^f- and the loop will 
terminate. 

The correctness can be proved showing that after any number of iterations 
the algorithm always finishes with an optimal node. We prove it by induction 
on the number of iterations performed. 

(Base Case) In the base case, only one iteration is executed. If the condition 
in Line (5) is satisfied then the root is marked as undefined and it is trivially 
the optimal node. This optimal node is returned in Line (5). Otherwise, Lines 
(4) and (6) select the heaviest child of the root, the loop terminates and Lines 
(9) or (10) return the optimal node. 

Note that the root node — when it is marked as Wrong — can only be selected 
in the first iteration. But even in this case, this node is never selected because 
the root node must have at least one child marked as Undefined. Thus Line (5) 
is not satisfied and Line (6) selects this node. If the condition of the loop is not 
satisfied, then Line (8) returns the roots' child. 

(Induction Hypothesis) We assume as the induction hypothesis that after i 
iterations, the algorithm has a candidate node Best G Sea(T) such that W G 
Sea(T), (Best -> n') E*,Best > n' . 

(Inductive Case) We now prove that the iteration i + 1 of the algorithm will 
select a new candidate node Candidate such that Candidate ^> Best, or it will 
terminate selecting an optimal node. 

Firstly, when the condition in Line (5) is satisfied Best and Candidate are 
the same node (say n'). According to the induction hypothesis, this node is 
better than any other of the nodes in the set {n" G Sea(T)\(n' — )> n") E*}. 
Therefore, because n' has no children, then it is an optimal node; and it is 
returned in Line (5). Otherwise, if the condition in Line (5) is not satisfied, Line 
(7) in the algorithm ensures that WBest > ^f being n the root of T because in 
the iteration i the loop did not terminate or because Best is the root. Moreover, 
according to Lines (4) and (6), we know that Candidate is the heaviest child of 
Best. We have two possibilities: 

— wcandidate > ^ : In this case the loop does not terminate and Vn' G Sea(T), 
(Candidate — » n) E* , Candidate ^> n. Firstly, by Lemma [4] we know that 
Candidate ^> Best, and thus, by the induction hypothesis we know that \/n G 
Sea(T),(Best — >> n) E* , Candidate > n . By Lemma [5] Candidate > n V 
Candidate = n being n a brother of Candidate. But as we know that wcandidate > 
^TJr then Candidate ^ n . Moreover, by Lemma |6J we can ensure that Candidate ^> 
n being n a descendant of a candidate's brother. 

— wcandidate < ^f : In this case the loop terminates (Line (7)) and by Lemma [3] we 
know that Candidate ^> n V Candidate = n being n a brother of Candidate. 
Moreover, by Lemma [6] we can ensure that Candidate ^> n being n a descendant 
of a candidate's brother. Then equation (w n > WBest + w candidate — wi n ) is applied 
in Line (9) to select an optimal node. Theorems [l] and [2] e nsures that the node 
selected is an optimal node because, according to Lemma [5j for all descendant n 
of Candidate, Candidate ^> n! . 
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A.3 Proof of Theorem \5\ 

Theorem [5] states the correctness of Algorithm [4] used in the general case when 
nodes can have different individual weights. For the proof of this theorem we 
define first some auxiliary lemmas. The following lemma ensures that w ni — 
^p- > ^f- used in the condition of the loop implies d ni > u ni . 



Lemma 7. Given a variable MET T = (iV, E, M) whose root is n G N and a 

2-^2 



node rt\ G Sea(T), d ni > u ni if and only if w ni ?p- > ^. 



Proof. We proof that w ni ?p- > ^f- implies d ni > u ni and vice versa. 

2w ni - wi ni > w n 
We replace w ni using Equation 2: 
2{d ni + wi ni ) - wi ni > w n 
2d ni + wi ni > w n 

0>m s* W n 0>n\ W%n\ 

We replace w n — d ni — wi ni using Equation 1: 

dn i ^ Uni 

The following lemma ensures that given two nodes ri\ and 772 where d n > u n 
in both nodes and n\ — >> 772 then ri2 ^> n\ V ri2 = n\. 

Lemma 8. Given a variable METT = (A/", E, M) and given two nodes ri\,n<i G 
Sea(T), with (n\ — > 77-2) G E", «/dn 2 > % 2 ^ en ^2>^iVri2=ni. 

Proof. We prove that |d n2 — ix n2 | < |d ni — u ni | holds. First, we know that 
4i = d n2 + m n2 + mc and ix Ul = u n2 — wi ni — inc with inc > 0, where inc 
represent the weight of the possible brothers of 772 . 

\U"ri2 Un2 I _ \ani Un-± \ 

As we know that d n > u n in both nodes: 

a U 2 ^n 2 _: a n i Um 

We replace d ni and u ni : 

dn 2 - Un 2 < (dn 2 + W%n 2 + 277c) - (l£n 2 ~ wi n± ~ ITlc) 

d n2 - u n2 < d n2 ~ u n2 + wi ni + win 2 + 2inc 

< Wni + ujin 2 + 2mc 
Hence, because wi ni , wi n21 inc > then \d n2 — u n2 \ < \d ni — u ni \ is satisfied 
and thus 77-2 ^> fi\ V 712 = ni . 

The following lemma ensures that given two nodes 771 and 772 where d n < u n 
in both nodes and 77 1 —> 772 then n\ ^> 772 V ri\ = 772. 

Lemma 9. Given a variable METT = (iV, £", M) and given two nodes 77i, 772 G 
Sea(T), with (n\ — >• 772) G E 1 , z/4i < w ni £/ien 77i ^> 772 V n\ =772. 

Proof. We prove that |d ni — u ni \ < \d n2 — u n2 \ holds. First, we know that 
dn 2 ~ d ni — win 2 ~ inc an d u n 2 — u n 1 + wz ni + inc with inc > 0, where inc 
represent the weight of the possible brothers of 772 . 
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l^ni Urii | _^ |Q^n2 Un 2 \ 

As we know that u n > d n in both nodes: 

Uni flni _^ U n 2 a n2 

We replace d n2 and u n2 '■ 

u ni - d ni < (u ni + wi ni + inc) - (d ni - wi n2 - inc) 
u ni - d ni < u ni - d ni + wi ni + win 2 + 2mc 

< K7ini + l^n 2 + 2mC 

Hence, because wi ni , wi n21 inc > then \d ni — u ni \ < \d n2 — u n2 \ is satisfied 
and thus n\ ^> 772 V ri\ = n 2 . 

The following lemma ensures that given two brother nodes n\ and 772, if 
u>n\ — ^ni tnen $ n2 \ w^ • 

Lemma 10. Given a variable MET T = (TV, E, M) whose root is n e N, and 
given three nodes n\ G N and n 2l n 3 G Sea(T), with (n — >• ni) G i^* ; (ni — > 
^2), (ni -> n 3 ) G # ; i/d n2 > ix n2 £/ien d n3 < u n3 . 

Proof. We prove it by contradiction assuming that d n3 > u n3 when d n2 > u n2 
and they are brothers. First, we know that as n2 and 773 are brothers then 
Un 2 > w n 3 and u n3 > w n2 . Therefore, if d n3 > u n3 then d n2 > u n2 > w n3 > 
d n3 > Un 3 > wtn 2 > <^n 2 that implies d n2 > d n2 that is a contradiction itself. 

If two nodes n\ and 77,2 are brothers and d ni > i£ ni then n\ ^> n 2 V n\ = 712. 
The following lemma proves this property. 

Lemma 11. Given a variable MET T = (N,E,M) whose root is n e N, and 
given three nodes n\ G N and n 2l n 3 G Sea(T), with (n — »■ 77,1) G £?*, (771 — >• 
^2), (^1 -» ^3) € E, if d n2 > u n2 then n 2 ^> n 3 V n 2 = n 3 . 

Proof We prove that | d n2 — u n2 \ < 
brothers we know that w n > d n2 + d 
wi n2 + i^ U3 + inc with inc > 0. 

1 U>n 2 U n2 I \ |^n3 ^?i 



|^n3 ^ri3 


| holds. 


First, 


as 


n2 and 773 


are 


ri3 ~T~ r W% ri2 


+ ^z n3 , 


then 


^n 


— ^ri2 ~r Q j n 3 ~r 



As d n2 > ^n 2 by Lemma 10 we know that u n3 > d n . 



We replace u n2 and u n3 using Equation 1: 

C?n 2 - (W n - dn 2 - wi n2 ) < (w n — dn 3 ~ wi n3 ) - d n3 
-W n + 2dn 2 + Win 2 < W n ~ 2dn 3 ~ win 3 
-2w n < -2d n2 - 2d n3 - Wi n2 - Wi n3 
2w n > 2d n2 + 2d n3 + W%n 2 + win 3 
^n > 4 2 + d n3 H ^ H 2^ 

We replace i^n: 

C?n 2 + ^n 3 + Wi n2 + ^^3 + mC > dn 2 + C?n 3 + ^p" + ^ i 

. Mino 1 wi n 3 
Wln 2 + ^Zn 3 + mC > ~^- -\ ^ 

—2^- H 2^ + ^77C > 

Hence, because wi n21 wi n31 inc > then \d n2 — u n2 \ < \d n3 — u n3 \ is satisfied 
and thus 772 ^> 773 V 772 = 773. 

The following lemma ensures that given two brother nodes 77 1 and 772, if 
w ni > w n2 and d ni < u ni then d n2 < u n2 . 
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Lemma 12. Given a variable MET T = (N,E,M) whose root is n G N, and 
given three nodes n\ G N and 712,713 G Sea(T), with (n — )> Tii) G £?*, (tii — >• 
n 2)-> (^1 - >■ ^3) € E, if w n<2 > w n3 and d n2 < u n2 then d n3 < u n3 . 

Proof. We prove it by contradiction assuming that d n3 > u n3 when w n2 > w n3 
and d n2 < u n<2 and they are brothers. First, we know that as 712 and 71 3 are 
brothers then u n2 > w n3 and u n3 > w n2 . Therefore, if d n3 > u n3 then d n3 > 
Un 3 > wtn 2 > w n 3 > d n3 that implies d n3 > d n3 that is a contradiction itself. 

If two nodes n\ and 712 are brothers and u ni > d ni A u n2 > d n2 then, if 

w n 1 ^ — Wn 2 ^ * s satisfied then 71 1 > n2 V ni = 712. The following 

lemma proves this property. 

Lemma 13. Given a variable MET T = (iV, E, M) whose root is n G N , and 
given three nodes ri\ G TV and 712,713 G Sea(T), with (n — >> Tii) G £^* ; (ni — >• 
^2), (^1 — > ^3) € £", and u n2 > d n2 and u n3 > d n3 , n2 > n3 V ri2 = 713 if and 

071/2/ «/ ^n 2 2^ > ™n 3 2^. 

Proof First, if |d n2 — ix n2 | < |d n3 — iz n3 | then ri2 ^> 713 V 712 = 713. Thus it is 
enough to prove that w n2 — ^j 2 - > w n3 — -^p- implies \d n2 — u n2 \ < \d n3 — u n3 \ 
and vice versa when u n > d n in both nodes and they are brothers. 

wi n2 wi n3 

W n2 2 _ W n3 2 

2w n2 - Win 2 > 2w n3 - Wln 3 

We replace w n2 and w n3 using Equation 2: 

2(d n2 + Wi n2 ) - W%n 2 > 2{dn 3 + Win 3 ) - W ^n 3 
2d n2 + Wi n2 > 2d n3 + W%n 3 

We add — u> n : 

— lU n + 2d n2 + m n2 > -W n + 2d n3 + ^n 3 
W n — 2dn 2 — Win 2 < W n — 2dn 3 ~ win 3 

We replace w n using Equation 1: 

(d n2 + u n2 + wi n2 ) - 2d n2 - wi n2 < (d n3 + u n3 + wi n3 ) - 2d n3 - m n3 

0^n2 ~r ^n2 _ ^713 ~T~ Un 3 
Un 2 0>n 2 _^ ^77,3 $77,3 

As ?i n > d n in both nodes: 

\Un 2 0>n 2 I _; l^ro.3 Cm 3 | 

\Cln 2 Un 2 I j^ \an 3 Un 3 \ 

If two nodes n\ and 712 are brothers and d ni > u ni and 712 — ^ + 713 then, if 
n\ = n2 then ri\ ^> n^ V ni =77,3. The following lemma proves this property. 

Lemma 14. Given a variable MET T = (TV, E, M) whose root is n G N, and 
given four nodes ri\ G N and 712,^3,714 G Sea(T), with (n ^ n±) <E E* , (n\ — >• 
712)5(711 -^ 713) G £^ ; (713 — » 714) G £^ + ; z/ d n2 > u n2 and 712 = 713 then 712 ^> 

7I4 V 71 2 =714. 

Proof. This can be trivially proof having into account that <i n3 < w n3 when 
<^n 2 ^ u n 2 by Lemma 10 and then by Lemma[9]we know that 713 ^> 714 V 713 = 714 



and as 712 = 713 then 712 ^> 714 V 712 = 714. 
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If two nodes n\ and 772 are brothers and d ni < u ni Ad n2 < u n2 and 772 — ^ + 773 
then, if rt\ = n^ then ri\ > 713 V ni = 77,3. The following lemma proves this 
property. 

Lemma 15. Given a variable MET T = (N,E,M) whose root is n E N, and 
given four nodes n\ E N and 772,773,774 E Sea(T), with (n — >• n\) E E* , (n\ — >• 
^2), (^1 — >" ^3) € E, (773 — » 774) G E + , if d n2 < u n2 and d n3 < u n3 and 772 = 773 
then 772 ^> 774 V 772 = 774 . 

Proof This can be trivially proof having into account that d n3 < u n3 and then 
by Lemma [9] we know that 773 ^> 774 V 773 = 774 and as 772 = 773 then 772 ^> 

774 V 772 =774. 

If two nodes ri\ and 772 are brothers and n\ ^> 772 and 772 — ?> + 773 then 
77i ^773. The following lemma proves this property. 

Lemma 16. Given a variable MET T = (TV, E, M) whose root is n E N , and 
given four nodes 77 1 E N and 772,773,774 G Sea(T), with (n — >• n\) E E* , (n\ — >• 
^2), (t7i — > 773) G £", (773 — » 774) G £? + ; i/77 2 > 773 then n 2 > n 4 . 

Proof. We show that if 772 ^> 773 then d n3 < u n3 . We prove it by contradiction 
assuming that d n3 > u n3 when 772 ^> 773. First, as 772 and 773 are brothers we 
know that w n > d n2 -\-d n3 -\-wi n2 +wi n3 , then w n = d n2 -\-d n3 +wi n2 -\-wi n3 -\-inc 
with inc > 0. Therefore, if \d n2 — u n2 \ < \d n3 — u n3 \ then 772 ^> 773. Thus it is 
enough to prove that \d n2 — u n2 \ < \d n3 — u n3 \ is not satisfied when d n3 > u n3 
and 77-2 and 773 are brothers. 

\Cln 2 Uri2 I ^ I ^tt-3 Un 



As dn 3 > u n3 by Lemma 10 we know that u n2 > d n2 : 

u n2 a n2 \ a n3 u n3 

We replace u n2 and u n3 using Equation 1: 

(W n - dn 2 - Win 2 ) ~ dn 2 < dn 3 ~ (w n ~ dn 3 ~ win 3 ) 
W n - 2d n2 - Wi n2 < 2d n3 - W n + Wl n3 
2w n < 2d n2 + 2d n3 + Wi n2 + Wi n3 

w n < d n2 + d n3 H ^ H 2^ 

We replace iu n : 

C?n 2 + ^n 3 + Wi n2 + m n3 + inC < d n2 + d ns + ^^- + ^ 3 - 

WZn 2 + Wln 3 + Z77C < — ^ -\ ^ 

— ^ H ^ + ^77C < 

But, this is a contradiction with wi n21 wi n31 inc > 0. Hence, d n3 < u n3 . 

Now we show that, if 772 ^> 773 then 772 ^> 774. We prove it by contradiction 
assuming that 774 ^> 772 V 774 = 772 when 772 ^773. First, we know that d n3 < u n3 . 
Therefore we know that d n4 = d n3 — wi n4 — dec and u n4 = u n3 + wi n3 + dec with 
dec > 0, where dec represent the weight of the possible brothers of 774. 

\an 3 Un 3 I -> \a>n 2 Un 2 \ ^_ | Q>ri4 Un 4 \ 

We replace d n4 and u n4 '- 

'^n 2 \ > \(d n3 - wi n4 - dec) - (u n3 + wi n3 + dec)\ 
'^n 2 1 > \d n3 — wi n4 — dec — Un 3 — w%n 3 — dec\ 
'J"n 2 \ > \dn 3 - Un 3 - wi n3 - wi n4 - 2dec\ 
must be positive, thus d n3 > u n3 . But this is a contradiction 
with d n3 < u n3 . 



\d n3 — U n3 


> \d n 


\dn 3 — Un 3 


> \d n 


\dn 3 — Un 3 


> \d n 


Note that d n3 


~ u n 3 
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The following lemma ensures that given two nodes n\ and 77,2 where d ni > u ni 

jin 1 wi n , 

~2 2^ 



and d n2 < u n2 and ri\ — >> ri2 then if w n > w ni + i^ n2 — ^p- — — ^ is satisfied 



then Tii ^ ^2 V rii =n2. 

Lemma 17. Given a variable METT = (TV, £", M) and ^zven two nodes ri\,n<i G 
Sea(T), with (n\ — >• 722) G E 1 , and d ni > n ni; and d n2 < n n2; 774 >n2Vni = n2 
z/ and only if w n > w ni + w n2 - ^p - ^p-. 

Proof. First, if |d ni — n ni | < |d U2 — n n2 | then ni ^> n2 or ni = 77,2- Thus it 

is enough to prove that 7/; n > w ni + n; n2 p- p- implies |d ni — n n J < 

\dn 2 ~ u n 2 \ an d y i ce versa when d ni > u ni and d n2 < n n2 . 

win-, wi n<2 
W n > W ni + Wn 2 ^ ^ 

We replace w ni ,w n2 using Equation 2: 

Wn > (dni + WnJ + (^2 + ^n 2 ) ~ ^^ ~ ^ 2 - 

w n > d ni + d n2 H ^ H 2^ 

2w n > 2d ni + 2dn 2 + wim + wi n2 
—2w n < — 2d ni - 2dn 2 - uiini - ^n 2 
—w n + 2d ni + m ni < w n - 2d n2 - wi n2 
We replace w n using Equation 1: 

— (dni + W ni + Wi ni ) + 2dni + W ^\ < (^ 2 + 7in 2 + W%n 2 ) ~ 2dn 2 ~ W%n 2 
-dni _ U n 1 - Wi ni + 2dni + W ^l < ^n 2 + ^n 2 + W%n 2 ~ 2d n2 — m n2 

Uni ~r ^Tii _: ^n 2 T" Un 2 
Clfii Um _; Un 2 0>n 2 

As dni > u ni and d n2 < 7in 2 : 

l^ni 77-ni | j^ \Un 2 ^n 2 | 

l^ni Um I — I^n2 ^n 2 | 

Finally, we prove the correctness of Algorithm [4] 

Theorem JHJ Let T = (TV, £", M) 6e a variable MET, then the execution of 
Algorithm^with T as input always terminates producing as output a node n G 
Sea(T) such that $n f G Sea(T) | n' ^> n. 

Proof The finiteness of the algorithm is proved thanks to the following invariant: 
each iteration processes one single node, and the same node is never processed 
again. Therefore, because N is finite, the loop will terminate. 

The proof of correctness is completely analogous to the proof of Theorem [4j 
The only difference is the induction hypothesis and the inductive case: 

(Induction Hypothesis) After i iterations, the algorithm has a candidate node 
Best G Sea(T) such that W G Sea(T), (Best -► n') E*,Best > ri V Best = 
n'. 

(Inductive Case) We prove that the iteration i + 1 of the algorithm will select 
a new candidate node Candidate such that Candidate ^> Best V Candidate = 
Best, or it will terminate selecting an optimal node. 

Firstly, when the condition in Line (5) is satisfied Best and Candidate are the 
same node (say n r ). According to the induction hypothesis, this node is better 
or equal than any other of the nodes in the set {n" G Sea(T)\(n' — » n") ^ E*}. 
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Therefore, because n' has no children, then it is an optimal node; and it is 
returned in Line (5). Otherwise, if the condition in Line (5) is not satisfied, Line 
(7) in the algorithm ensures that WBest — w% ^ est > ^f- being n the root of T 
because in the iteration i the loop did not terminate or because Best is the root 
(observe that an exception can happen when all nodes have an individual weight 
of 0. But in this case all nodes are optimal, and thus the node returned by the 
algorithm is optimal). Then we know that dsest > UBest by Lemma[7j Moreover, 
according to Lines (4) and (6), we know that Candidate is the heaviest child of 
Best. We have two possibilities: 

— dcandidate > Ucandidate- In this case the loop does not terminate and Vn' 6 Sea(T), 
(Candidate — » n) E* , Candidate ^> n V Candidate = n '. Firstly, by Lemma [8] 
we know that Candidate ^> Best V Candidate = Best, and thus, by the induc- 
tion hypothesis we know that Vn' 6 Sea(T),(Best — » n') E* , Candidate ^> 



n 'V Candidate = n '. By Lemma 11_ we know that Candidate ^> n 'V Candidate 
being n a brother of Candidate. Moreover, by Lemma 14 and 16 we can ensure 
that Candidate ^> n' V Candidate = n' being n' a descendant of a candidate's 
brother. 



Lemma 



Icandidate < u C andidate'- In this case the loop terminates (Line (7)) and we know by 
that d n > < u n i being n any brother of Candidate. In Line (8) according 
13] we select the Candidate such that Candidate ^> n V Candidate = n x 
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to Lemma 

being n' a brother of Candidate. Moreover, by Lemma 15 and 16 we can ensure that 
Candidate ^> n V Candidate = n x being n x a descendant of a candidate's brother. 
Then equation (w n > w b est +w Candidate - w%B 2 est - Wl e>andidate j j g applied in Line 
(10) to select an optimal node. Lemma [l7[e nsure that the node selected is an 
optimal node because, according to Lemma [9] for all descendant n of Candidate, 
Candidate ^> n V Candidate = n . 



